NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On Convex Optimization with Semi-Sensitive Features

Ghazi, Badih; Kamath, Pritish; Kumar, Ravi; Manurangsi, Pasin; Meka, Raghu; Zhang, Chiyuan (August 2025, 10.48550/arXiv.2406.19040.)

We study the differentially private (DP) empirical risk minimization (ERM) problem under the semi-sensitive DP setting where only some features are sensitive. This generalizes the Label DP setting where only the label is sensitive. We give improved upper and lower bounds on the excess risk for DP-ERM. In particular, we show that the error only scales polylogarithmically in terms of the sensitive domain size, improving upon previous results that scale polynomially in the sensitive domain size (
more » « less
Free, publicly-accessible full text available August 29, 2026
Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

Chandrasekaran, Gautam; Klivans, Adam; Kontonis, Vasilis; Meka, Raghu; Stavropoulos, Konstantinos (April 2025, https://doi.org/10.48550/arXiv.2407.00966)

Traditional models of supervised learning require a learner, given examples from an arbitrary joint distribution on 𝑅 𝑑 × { ± 1 } R d ×{±1}, to output a hypothesis that competes (to within 𝜖 ϵ) with the best fitting concept from a class. To overcome hardness results for learning even simple concept classes, this paper introduces a smoothed-analysis framework that only requires competition with the best classifier robust to small random Gaussian perturbations. This subtle shift enables a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (multi-index model) and (2) has bounded Gaussian surface area. This class includes functions of halfspaces and low-dimensional convex sets, which are only known to be learnable in non-smoothed settings with respect to highly structured distributions like Gaussians. The analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, the authors present the first algorithm for agnostically learning intersections of 𝑘 k-halfspaces in time 𝑘 ⋅ poly ( log ⁡ 𝑘 , 𝜖 , 𝛾 ) k⋅poly(logk,ϵ,γ), where 𝛾 γ is the margin parameter. Previously, the best-known runtime was exponential in 𝑘 k (Arriaga and Vempala, 1999).
more » « less
Free, publicly-accessible full text available April 30, 2026
Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps

Kelner, Jonathan; Koehler, Frederic; Meka, Raghu; Rohatgi, Dhruv (February 2025, stat.ML)

t is well-known that the statistical performance of Lasso can suffer significantly when the covariates of interest have strong correlations. In particular, the prediction error of Lasso becomes much worse than computationally inefficient alternatives like Best Subset Selection. Due to a large conjectured computational-statistical tradeoff in the problem of sparse linear regression, it may be impossible to close this gap in general. In this work, we propose a natural sparse linear regression setting where strong correlations between covariates arise from unobserved latent variables. In this setting, we analyze the problem caused by strong correlations and design a surprisingly simple fix. While Lasso with standard normalization of covariates fails, there exists a heterogeneous scaling of the covariates with which Lasso will suddenly obtain strong provable guarantees for estimation. Moreover, we design a simple, efficient procedure for computing such a "smart scaling." The sample complexity of the resulting "rescaled Lasso" algorithm incurs (in the worst case) quadratic dependence on the sparsity of the underlying signal. While this dependence is not information-theoretically necessary, we give evidence that it is optimal among the class of polynomial-time algorithms, via the method of low-degree polynomials. This argument reveals a new connection between sparse linear regression and a special version of sparse PCA with a near-critical negative spike. The latter problem can be thought of as a real-valued analogue of learning a sparse parity. Using it, we also establish the first computational-statistical gap for the closely related problem of learning a Gaussian Graphical Model.
more » « less
Free, publicly-accessible full text available February 26, 2026
Explicit separations between randomized and deterministic Number-on-Forehead communication

Kelley, Zander; Lovett, Shacahr; Meka, Raghu (June 2024, ACM Symposium on Theory of Computing)

Full Text Available
Explicit Separations between Randomized and Deterministic Number-on-Forehead Communication

https://doi.org/10.1145/3618260.3649721

Kelley, Zander; Lovett, Shachar; Meka, Raghu (June 2024, ACM)
Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps

Kelner, Jonathan; Koehler, Frederic; Meka, Raghu; Rohatgi, Dhruv (June 2024, Journal of machine learning research)
Learning Neural Networks with Sparse Activations

Awasthi, Pranjal; Dikkala, Nishant; Kamath, Pritish; Meka, Raghu (June 2024, Journal of machine learning research)
Learning Neural Networks with Sparse Activations

Awasthi, Pranjal; Dikkala, Nishanth; Kamath, Pritish; Meka, Raghu (June 2024, Proceedings of the 37th Conference on Learning Theory (COLT 2024))

A core component present in many successful neural network architectures, is an MLP block of two fully connected layers with a non-linear activation in between. An intriguing phenomenon observed empirically, including in transformer architectures, is that, after training, the activations in the hidden layer of this MLP block tend to be extremely sparse on any given input. Unlike traditional forms of sparsity, where there are neurons/weights which can be deleted from the network, this form of {\em dynamic} activation sparsity appears to be harder to exploit to get more efficient networks. Motivated by this we initiate a formal study of PAC learnability of MLP layers that exhibit activation sparsity. We present a variety of results showing that such classes of functions do lead to provable computational and statistical advantages over their non-sparse counterparts. Our hope is that a better theoretical understanding of {\em sparsely activated} networks would lead to methods that can exploit activation sparsity in practice.
more » « less
Full Text Available
Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps

Kelner, Jonathan A; Koehler, Frederic; Meka, Raghu; Rohatgi, Dhruv (June 2024, Proceedings of Machine Learning Research)

Full Text Available
Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

Chandrasekaran, Gautam; Klivans, Adam; Kontonis, Vasilis; Meka, Raghu; Stavropoulos, Konstantinos (July 2024, Conference on Learning Theory 2024)

In the well-studied agnostic model of learning, the goal of a learner– given examples from an arbitrary joint distribution – is to output a hypothesis that is competitive (to within 𝜖) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes in this model, we introduce a smoothed analysis framework where we require a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Perhaps surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of 𝑘 -halfspaces in time 𝑘\poly(log𝑘𝜖𝛾) where 𝛾 is the margin parameter. Before our work, the best-known runtime was exponential in 𝑘 (Arriaga and Vempala, 1999).
more » « less
Full Text Available

« Prev Next »

Search for: All records